August 15, 2014
-17$ million loss
Is it possible to leverage screenplay information to predict movie profitability and assist the descision-making process for screenplay selection?
Some features that were tried but failed to produce convincing results: readability index, sentiment analysis, tf-idf, tf-idf with POS tagging.
word2vec - Efficient Estimation of Word Representations in Vector Space (published by Google).
Allows to cluster words with similar meaning. These clusters can be used as features in a predictive model.
\[ \hat{y} = x_{budget} + \sum_{i=1}^{n} x_{i, word2vec} \]
rmarkdown::render("MillionDollarStory_Presentation.Rmd")